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Abstract. A growing number of astronomical resources 
and data or information services are made available 
through the Internet. However valuable information is fre- 
quently hidden in a deluge of non-pertinent or non up- 
to-date documents. At a first level, compilations of as- 
tronomical resources provide help for selecting relevant 
sites. Combining yellow-page services and meta-databases 
of active pointers may be an efficient solution to the data 
retrieval problem. Responses generated by submission of 
queries to a set of heterogeneous resources are difficult 
to merge or cross-match, because different data providers 
generally use different data formats: new endeavors are 
under way to tackle this problem. We review the techni- 
cal challenges involved in trying to provide general search 
and discovery tools, and to integrate them through upper 
level interfaces. 

Key words: Astronomical databases: miscellaneous 



1. Introduction 

How to help the users find their way through the jungle of 
information services is a question which has been raised 
since the early development of the WWW (see e.g., Egret 



1994), when it became clear that a big centralized system 



was not the efficient way to go. 

Obviously the World Wide Web is a very powerful 
medium for the development of distributed resources: on 
the one hand the WWW provides a common medium for 
all information providers - the language is flexible enough 
so that it does not bring unbearable constraints on exist- 
ing databases - on the other hand the distributed hyper- 
textual approach opens the way to navigation and links 
between services (provided a minimum of coordination can 
be achieved). Let us note that it has been already widely 
demonstrated that coordinating spirit is not out of reach 
in a small community such as astronomy, largely sheltered 
from commercial influence. 



Searching for a resource (either already visited, or un- 
known but expected) , or browsing lists of existing services 
in order to discover new tools of interest implies a need 
for query strategies that cannot generally be managed at 
the level of a single data provider. 

There is a need for road-guides pointing to the most 
useful resources, or to compilations or databases where in- 
formation can be found about these resources. Such guides 
have been made in the past, and are of very practical help 
for the novice as well as the trained user, for example: An- 



dernach et al. 1994, Egret & Heck 1995, Egret fc Albrecht 



1995, Heck 1997, Grothkopf 1995, Andernach 1999 



In the present paper our aim is to address the ques- 
tions related to the collection, integration and interfacing 
of the wealth of astronomical Internet resources, and also 
to describe some strategies that have to be developed for 
building cooperative tools which will be essential in the 
research environment of the decade to come. 



2. Compilations of astronomical Internet 
resources 

At a first level, the user looking for new sources of in- 
formation can consult compilations of existing resources. 
Examples of such databases, or yellow-page services are 
given in this section. 

2.1. The StarPages 

Star*s Family is the generic name for a collection of di- 
rectories, dictionaries and databases which has been de- 
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scribed in details by Heck ( 1995a ) who has been building 
up their contents for more than twenty-five years. These 
very exhaustive data sets are carefully updated and vali- 
dated, thus constituting a gold mine for professional, am- 
ateur astronomers, and more generally all those who are 
curious of space-related activities, and want to locate ex- 
isting resources. 

The Star*s Family of products can be queried on- 
line from the CDS Web site (Strasbourg, France) under 
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the generic name of StarPages^ It includes the following 
databases: 

StarWorlds: a directory of astronomy, space sciences, and 



related organizations (Heck et al. 1994); it includes 
URLs of Web sites when available, as well as e-mail 
addresses; unlike most of the services mentioned in the 
present paper, it is not restricted to describing on-line 
resources, but also lists directory entries for organiza- 
tions which do not provide any on-line information. 
StarHeads: individual Web pages essentially of as- 



tronomers and related space scientists (Heck 1995b ). 
StarBits: a very comprehensive dictionary of abbrevia- 
tions, acronyms, contractions, and symbo ls used in as- 
tronomy and space sciences (Heck 1995b). 



All three databases are associated with a query engine 
based on character string searches. Filters prevent extrac- 
tion of too large subsets of the database. 

2.2. AstroWeb 



AstroWeb (Jackson et al. 1994) is a collection of pointers 
to astronomically relevant information resources available 
on the Internet. The browse mode of AstroWeb opens a 
window on the efforts currently developed - in some cases, 
unfortunately, in a rather disorganized way - for making 
astronomically related, and hopefully pertinent, informa- 
tion available on-line through the World Wide Web. 

AstroWeb is maintained by a small consortium of in- 
dividuals located at CDS, STScI, MSSSO, NRAO, and 
Vilspa. The master database is currently hosted at CDS0 
(after having been for a long time at STScI), and all the 
above-mentioned places, as well as the Institute of As- 
tronomy, Cambridge, host a mirror copy with customized 
presentation of the same data. 

Each URL is checked by a robot on a daily basis to 
ensure aliveness of all referenced resources. The resource 
descriptions are usually submitted by the person or orga- 
nization responsible for the resource, but are checked and 
eventually modified by one of the consortium members. 
The search engine is a wais search index. The index is con- 
structed from the resource descriptions, and also includes 
all the words contained in the referenced home page. This 
latter feature is quite powerful for bringing new names 
of projects, topics, research groups, very quickly to the 
index. 

Table |l| lists the resources present in the AstroWeb 
database in December 1999. 



3. Current status of on-line astronomy resources 

Following the classification scheme adopted by AstroWeb, 
we will outline in this section the current status of the 



Table 1. Resources listed in the AstroWeb database (De- 
cember 1999). The number of resources (Web sites) in each 
category is given between parentheses. A number of re- 
sources appear in more than one category. 

Organizations Astronomy Departments (508) 

Professional and Amateur Organizations 
(159) 

Space Agencies and Organizations (46) 



Observing 
resources 



Data resources 



Abstracts, 
Publications, 

Libraries 



People-related 
Resources 



Software 
Computer 
Science 



Research areas 
Astronomy 
Space Physics 



Educational 



resources 



Miscellaneous 



1 http:/ /cdsweb.u-strasbg.fr/starpages.html 

2 http:/ /cdsweb.u-strasbg.fr/astroweb.html 



Observatories and Telescopes (328) 
Telescope Observing Schedules (25) 
Meteorological Information (10) 
Astronomical Survey Projects (65) 

Data and Archive Centers (145) 
Astronomy Information Systems (39) 

Bibliographical Services (29) 
Astronomical Journals and Publications 
(90) 

Astronomy & astrophysics preprints (58) 
Abstracts of Astronomical Publications 
(29) 

Conference Proceedings (45) 
Astronomy-related Libraries (48) 
Other Library resources (11) 

Personal Web pages (800) 
People (lists) (14) 
Jobs (37) 

Conferences and Meetings (45) 
Newsgroups (31) 
Mailing Lists (16) 

Astronomy software servers (129) 
Document Preparation Tools (9) 
Overviews & technical notes for protocols 
(11) 

Computer Science-related Resources (33) 

Radio Astronomy (109) 

Optical Astronomy (178) 

High-Energy Astronomy (77) 

Space Astronomy (175) 

Solar Astronomy (77) 

Planetary Astronomy (64) 

History of Astronomy (21) 

Earth, Ocean, Atmosphere, Space Sciences 

(41) 

Physics-related Resources (91) 

Professional and Amateur Organizations 
(159) 

Educational resources (240) 
Astronomy Pictures (105) 

Primary Lists of Astronomy Resources 
(10) 

Other lists of astronomy resources (78) 
Miscellaneous Resources (137) 
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main categories of on-line astronomy resources, pointing 
to meta-resources (i.e. organized lists of resources) when 
they are available. 

3.1. Organizations 

Most of the active astronomical organizations (institutes, 
astronomy departments, etc.) now have home pages on the 
Internet. StarWorlds^ is currently the most comprehensive 
searchable directory of such resources ; it can be queried 
by names, keywords, or character strings. For browsing 
lists sorted by alphabetical order or by country, see As- 
tro Web (Section pT^ ). National or international organiza- 
tions also maintain useful lists. 

3.2. Observational projects and missions 

It is now difficult to envisage an observational project 
without a web site. As they are more dynamic and often 
involve multiple organizations or institutions, the best way 
to find them may be to use one of the powerful commercial 
search engines that routinely index millions of web pages 
on the Internet. 

The indexing system of AstroWeb may also be helpful, 
especially when it is important to limit the investigation 
domain to astronomy, or to keep track of new emerging 
projects. 



sonal Web pages. Directories from national or interna- 
tional societies (e.g., AAS, EAS, IAU) are also generally 
very carefully kept up to date. 

The database of meetings and conferences maintained 
by CFHTp] is very complete and well organized. Astro- 
nomical societies also maintain their own lists. 



3.6. Astronomical software 

The Astronomical Software and Documentation Service 
(ASDSg) is a network service that allows users to locate 
existing astronomical software, associated technical doc- 
umentation, and information about telescopes and astro- 



nomical instrumentation (Payne et al. 1996). ASDS orig- 



inated as a service devoted entirely to astronomical soft- 
ware packages and their associated on-line documentation 
and was originally called the Astronomical Software Di- 
rectory Service. Much code is rewritten these days, not 
because anyone has found a fundamentally better way to 
solve the problem, but because developers simply don't 
know who has already done it, whether the code runs on 
the system they have available, or where to get it if it does. 
That is the problem that ASDS was intended to solve. 

In 1998 the scope of ASDS was expanded to include 
astronomical observing sites and their associated telescope 
and instrument manuals, taken from a listing maintained 
at CFHT. The service was renamed at this point. 



3.3. Data and information systems 

Astronomy data and information centers are becoming 
increasingly interconnected, with both explicit links to 
other relevant resources and automatic cross-links that 
may be invoked transparently to the end-user. Section ^| 
describes current efforts to provide interoperability within 
astrophysics (Astrobrowse) and across the space sciences 
(ISAIA). 



3.4- Bibliographic resources 

Here also a virtual network is being organized, as exempli- 
fied by the t/rama^] initiative, or by the coordinated efforts 
to create links between ADS and other services (Kurtz et 
al. 2000 ). Note that many of the bibliographical resources 
are electronic journals for which a subscription may be 
required. 



3.5. People-related resources 

Some databases (RGO E-mail directory^], StarHeads^j) fol- 
low the development of electronic mail addresses and per- 



3 http:/ /cdsweb.u-strasbg.fr/starworlds.html 

4 http://www.aas.org/Urania/ 

5 http:/ /star- www. rl.ac.uk/astrolist/astrosearch. html 

6 http:/ /cdsweb. u-strasbg.fr/starheads.hml 



3. 7. Educational resources 

Education and public outreach have always been a strong 
concern in astronomy, but the importance of this activity 
is growing at a higher rate, with the advent of the World 
Wide Web. 

It is difficult to give general rules for such a wide field, 
going far beyond the limits of astronomical institutions. 
Let us just say that we expect to see in the future an 
increasing role of educational institutions (planetariums, 
or outreach departments of big societies or institutions), 
for conveying general astronomy knowledge, or news about 
recent discoveries, to the general public. 

The yellow-page services mentioned above do keep lists 
of the most important education services. 

4. Towards a global index of astronomical 
resources 

In the following we will focus on Internet resources that 
actually provide data, of any kind, as opposed to those 
describing or documenting an institution or a research 
project, without giving access to any data set or archive. 

One main trend is certainly the increase of interconnec- 
tions between distributed on-line services, the Weaving of 



7 http:/ /cadcwww.dao.nrc.ca/meetings/meetings.html 

8 http://asds.stsci.edu/ 
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the Astronomy Web' (which was the title of a Conference 
organized in Strasbourg by Egret & Heck 1995). 

More generally, with the development of the Internet, 
and of a large number of on-line services giving access to 
data or information, it is clear that tools giving coordi- 
nated access to distributed services are needed. This is, 
for instance, the concern expressed b y NA SA through the 
Astrobrowse project (Heikkila et al. 1999 ). 

In this section we will first describe a tool for manag- 
ing a "metadata" dictionary of astronomy information ser- 
vices (GLU); then we will show how the existence of such 
a metadatabase can be used for building efficient search 
and discovery tools. 

4.1. The CDS GLU 

The CDS (Centre de Donnees astronomiques de Stras- 
bourg) has recently developed a tool for managing remote 
links in a context of distributed heterogeneous services 
(GLU^, Generateur de Liens Uniformes, i.e. Uniform Link 
Generator; Fernique et al. 1998). 



First developed for ensuring efficient interoperability 
of the several services existing at CDS (VizieR, Sim- 
bad, Aladin, bibliography, etc.; see Genova et al. 2000| ), 
this tool has also been designed for maintaining addresses 
(URLs) of distributed services (ADS, NED, etc.). 

A key element of the system is the "GLU dictionary" 
maintained by the data providers contributing to the sys- 
tem, and distributed to all sites of a given domain. This 
dictionary contains knowledge about the participating ser- 
vices (URLs, syntax and semantics of input fields, descrip- 
tions, etc.), so that it is possible to generate automatically 
a correct query for submission to a remote database. 

The service provider (data center, archive manager, or 
webmaster of an astronomical institute) can use GLU for 
coding a query, taking benefit of the easy update of the 
system: knowing which service to call, and which answer 
to expect from this service, the programmer does not have 
to worry about the precise address of the remote service 
at a given time, nor of the detailed syntax of the query 
(expected format of the equatorial coordinates, etc.). 

4-2. New search and discovery tools 

The example of GLU demonstrates the usefulness of stor- 
ing into a database the knowledge about information ser- 
vices (their address, purpose, domain of coverage, query 
syntax, etc.). In a second step, such a database can be 
queried when the challenge is to provide information about 
whom is providing what, for a given object, region of the 
sky, or domain of interest. 

Several projects are working toward providing general 
solutions. 



4.2.1. Astrobrowse 

Astrobrowse is a project that began within the United 
States astrophysics community, primarily within NASA 
data centers, for developing a user agent which signif- 
icantly streamlines the process of locating astronomical 
data on the web. Several prototype implementations are 
already available^. With any of these prototypes, a user 
can already query thousands of resources without having 
to deal with out-of-date URLs, or spend time figuring out 
how to use each resource's unique input formats. Given 
a user's selection of web-based astronomical databases 
and an object name or coordinates, Astrobrowse will send 
queries to all databases identified as containing potentially 
relevant data. It provides links to these resources and al- 
lows the user to browse results from each query. Astro- 
browse does not recognize, however, when a query yields 
a null result, nor does it integrate query results into a 
common format to enable intercomparison. 

4.2.2. AstroGLU 

Consider the following scenario: we have a data item / 
(for example an author's name, the position or name of 
an astronomical object, a bibliographical reference, etc.), 
and we would like to know more about it, but we do not 
know a priori which service S to contact, and what are 
the different data types D which can be requested. This 
scenario is typical of a scientist exploring new domains as 
part of a research procedure. 

The GLU dictionary can actually be used for help- 
ing to solve this question: the dictionary can be consid- 
ered as a reference directory, storing the knowledge about 
all services accepting data item / as input, for retrieving 
data D\ or £>2- For example, we can easily obtain from 
such a dictionary the list of all services accepting an au- 
thor's name as input; information which can be accessed, 
in return, may be an abstract (service ADS), a preprint 
(LANL/astro- ph), the author's address (RGO e-mail di- 
rectory) or personal Web page (StarHeads), etc. 

Based on such a system, it becomes possible to create 
automatically a simple interface guiding the user towards 
any of the services described in the dictionary. 

This idea has been developed as a prototype tool, un- 
der the name of AstroGLU^] (Egret et al. [1998| ). The aim 
of this tool is to help the users find their way among several 
dozens (for the moment) of possible actions or services. A 
number of compromises have to be taken between provid- 
ing the user with the full information (which would be too 
abundant and thus unusable), and preparing digest lists 
(which implies hiding some amount of auxiliary informa- 
tion and making somewhat subjective selections). 

A resulting issue is the fact that the system puts on the 
same line services which have very different quantitative or 



http: / / simbad.u-strasbg.fr / gfu / glu.htx 



10 http:/ /heasarc. gsfc.nasa.gov/ab/ 

11 http:/ /simbad.u-strasbg.fr/glu/cgi-bin/astroglu.pl 
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qualitative characteristics. AstroGLU has no efficient ways 
yet to provide the user with a hierarchy of services, as a 
gastronomic guide would do for restaurants. This might 
come to be a necessity in the future, as more and more 
services become (and remain) available. 



5. Towards an integration of distributed data and 
information services 

To go further, one needs to be able to integrate the re- 
sult of queries provided by heterogeneous services. This 
is the aim of the ISAIA (Integrated System for Archival 
Information Access) project^ (Hanisch 2000a. 20001: ). 

The key objective of the project is to develop an in- 
terdisciplinary data location and integration service for 
space sciences. Building upon existing data services and 
communications protocols, this service will allow users to 
transparently query a large variety of distributed hetero- 
geneous Web-based resources (catalogs, data, computa- 
tional resources, bibliographic references, etc.) from a sin- 
gle interface. The service will collect responses from vari- 
ous resources and integrate them in a seamless fashion for 
display and manipulation by the user. 

Because the scope of ISAIA is intended to span the 
space sciences - astrophysics, planetary science, solar 
physics, and space physics - it is necessary to find a way 
to standardize the descriptions of data attributes that are 
needed in order to formulate queries. The ISAIA approach 
is based on the concept of profiles. Profiles map generic 
concepts and terms onto mission or dataset specific at- 
tributes. Users may make general queries across multiple 
disciplines by using the generic terms of the highest level 
profile, or make more specific queries within subdisciplincs 
using terms from more detailed subprofiles. 

The profiles play three critical and interconnected 
roles: 



1. They identify appropriate resources (catalogs, mission 
datasets, bibliographic databases): the resource profile 

2. They enable generic queries to be mapped unambigu- 
ously onto resource-specific queries: the query profile 

3. They enable query responses to be tagged by content 
type and integrated into a common presentation for- 
mat: the response profile 

The resource, query, and response profiles are all aspects of 
a common database of resource attributes. Current plans 
call for these profiles to be expressed using XML (extensi- 
ble Markup Language, an emerging standard which allows 
embedding of logical markup tags within a document) and 
to be maintained as a distributed database using the CDS 
GLU facility. 

The profile concept is critical to a distributed data 
service where one cannot expect data providers to modify 
their internal systems or services to accommodate some 



externally imposed standard. The profiles act as a thin, 
lightweight interface between the distributed service and 
the existing specific services. Ideally the service-specific 
profile implementations are maintained in a fully dis- 
tributed fashion, with each data or service provider run- 
ning a GLU daemon in which that site's services are fully 
described and updated as necessary. Static services or ser- 
vices with insufficient staff resources to maintain a local 
GLU implementation can still be included, however, as 
long as their profiles are included elsewhere in the dis- 
tributed resource database. The profile concept is not 
unique to space science, but would apply equally well to 
any distributed data service in which a common user in- 
terface is desired to locate information in related yet tra- 
ditionally separate disciplines. 

6. Information clustering and advanced user 
interfaces 

A major challenge in current information systems research 
is to find efficient ways for users to be able to visualize 
the contents and understand the correlations within large 
databases. The technologies being developed are likely to 
be applicable to astronomical information systems. For ex- 
ample, information retrieval by means of "semantic road 
maps" was first detailed in Doyle ( 1961 ), using a power- 
ful spatial metaphor which lends itself quite well to mod- 
ern distributed computing environments such as the Web. 
The Kohonen self-organizing feature map (SOM; Koho- 
nen 1982) method is an effective means towards this end 



of a visual information retrieval user interface. 



6.1. Interfacing datasets with a Self-organizing Map 

The Kohonen map is, at heart, fc-means clustering with 
the additional constraint that cluster centers be located on 
a regular grid (or some other topographic structure) and 
furthermore their location on the grid be monotonically 
related to pairwise proximity (Murtagh & Hernandez- 



Pajares 1995) 



http:/ /heasarc. gsfc.nasa.gov/isaia/ 



A regular grid is quite convenient for an output rep- 
resentation space, as it maps conveniently onto a visual 
user interface. In a web context, it can easily be made 
interactive and responsive. 

Fig. [l] shows an example of such a visual and interac- 
tive user interface map, in the context of a set of journal 
articles described by their keywords. Color is related to 
density of document clusters located at regularly spaced 
nodes of the map, and some of these nodes/clusters are an- 
notated. The map is installed on the Web as a clickable im- 
age map, with CGI programs accessing lists of documents 
and - through further links - in many cases, the full docu- 
ments. In the example shown, the user has queried a node 
and results are seen in the right-hand panel. Such maps 
are maintained for (currently) 12,000 articles from the 
Astrophysical Journal, 7000 from Astronomy and Astro- 
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Fig. 1. Visual interactive user interface to a set of articles 
from the journal Astronomy and Astrophysics. Original in 
color. 



physics, over 2000 astronomical catalogs, and other data 
holdings. More information on the design of this visual 
interface and user assessment can be found in Poingot et 
al. ( |1998| , |2000D . 



6.2. Hyperlink clustering 



Guillaume & Murtagh (200C) have recently developed a 
Java-based visualization tool for hyperlink-based data, 
in XML, consisting of astronomers, astronomical object 
names, article titles, and possibly other objects (images, 
tables, etc.). Through weighting, the various types of links 
could be prioritized. An iterative refinement algorithm was 
developed to map the nodes (objects) to a regular grid of 
cells, which, as for the Kohoncn SOM map, are clickable 
and provide access to the data represented by the cluster. 
Fig. |2] shows an example for an astronomer (Prof. Jean 
Heyvaerts, Strasbourg Astronomical Observatory). 

These new cluster-based visual user interfaces are not 
computationally demanding. In general they cannot be 
created in real time, but they are scalable in the sense 
that many tens of thousands of documents or other objects 
can be easily handled. Document management (see e.g. 
CartiaQ) is less the motivation as is instead the interactive 
user interface. 

Further information on these visual user interfaces can 



be found in Guillaume (|2000| ) and Poingot (|1999|) 
13 http://www.cartia.com/ 
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Fig. 2. Visual interactive user interfaces, based on graph 
edges. Vertices are author names, article titles and (not 
shown here) astronomical object names. Map for as- 
tronomer Jean Heyvaerts. Original in color. 

6.3. Future developments for advanced interfaces 

Two directions of development are planned in the near 
future. Firstly, visual user interfaces need to be coupled 
together. A comprehensive "master" map is one possibil- 
ity, but this has the disadvantage of centralized control 
and/or configuration control. Another possibility is to de- 
velop a protocol such that a map can refer a user to other 
maps in appropriate circumstances. Such a protocol was 
developed a number of years ago in a system called In- 
gridp 1 ] developed by P. Francis at NTT Software Labs in 
Tokyo (see Guillaume 2000). However this work has been 
reoriented since then. 

Modern middleware tools may offer the following solu- 
tion. This is to define an information sharing bus, which 
will connect distributed information maps. It will be in- 
teresting to look at the advantages of CORBA (Common 
Object Request Broker Architecture) or, more likely, EJB 
(Enterprise Java Beans), for ensuring this interoperability 
infrastructure (Lunney & McCaughey |2000 ) . 

A second development path is to note the clustering 
which is at the core of these visual user interfaces and to 
ask whether this can be further enhanced to facilitate con- 
struction of query and response agents. It is clear to any- 
one who uses Internet search engines such as AltaVista, 
Lycos, etc. that clustering of results is very desirable. A 
good example of such clustering of search results in prac- 
tice is the Ask Jeeves search engine^. The query interface, 
additionally, is a natural language one, another plus. 



7. Conclusion 

The on-line "Virtual Observatory" is currently under con- 
struction with on-line archives and services potentially 



14 http://www.ingrid.org/ 

15 http://www.askjeeves.com/ 
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giving access to a huge quantity of scientific information: 
its services will allow astronomers to select the informa- 
tion of interest for their research, and to access original 
data, observatory archives and results published in jour- 
nals. Search and discovery tools currently in development 
will be of vital importance to make all the observational 
data and information available to the widest scientific 
community. 
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